May 8, 2025
Past
Present
FUTURE
Photo of Weobley High School / IMD
Family is the foundation on which many good things build.
Reproducibility is a continuous variable (Peng 2011)
Source: Raff (2023)
Time
Know-how
Lack of permission
Software is not open
Data is not open access
Someone might use it in unethical ways
Someone might “steal” the work
Lovelace, Tennekes, and Carlino (2022)
Illustration of the ClockBoard zoning system used to visualize a geographically dependendent phenomena: air quality, measured in mass of PM10 particles, measured in micrograms per cubic meter, from the London Atmospheric Emissions Inventory (LAEI). The facets show the data in spatial grid available from the LAEI, facet Am and aggregated to London boroughs B, to ClockBoard zones covering all the input data shown in C, and ClockBoard zones clipped by the administrative boundary of Greater London in D.
Premise: A key reason for reproducibility is generalisability.
options(timeout = 600) # 10 minutes
u1 = "https://movilidad-opendata.mitma.es/estudios_basicos/por-distritos/viajes/ficheros-diarios/2024-03/20240301_Viajes_distritos.csv.gz"
f1 = basename(u1)
if (!file.exists(f1)) {
download.file(u1, f1)
}
drv = duckdb::duckdb("daily.duckdb")
con = DBI::dbConnect(drv)
od1 = duckdb::tbl_file(con, f1)Credit: Egor Kotov
# Process the data
od_large = od_database |>
group_by(origen, destino) |>
summarise(Trips = sum(viajes), .groups = "drop") |>
filter(Trips > 500) |>
collect() |>
arrange(desc(Trips))
# ℹ 37,013 more rows
# Convert to geo with {od} package:
od_large_interzonal_sf = od::od_to_sf(
od_large_interzonal,
z = distritos_wgs84
)
od_large_interzonal_sf |>
ggplot() +
geom_sf(aes(size = Trips), colour = "red") +
theme_void()od_salamanca = od_database |>
filter(origen %in% ids_salamanca) |>
filter(destino %in% ids_salamanca) |>
collect()
group_by(origen, destino) |>
summarise(Trips = sum(viajes)) |>
arrange(Trips)
od_salamanca_sf = od::od_to_sf(
od_salamanca,
z = distritos_salamanca
)
od_salamanca_sf |>
filter(origen != destino) |>
ggplot() +
geom_sf(aes(colour = Trips), size = 1) +
scale_colour_viridis_c() +
theme_void()od_jittered = odjitter::jitter(
od_salamanca_sf,
zones = distritos_salamanca,
subpoints = drive_net,
disaggregation_threshold = 1000,
disaggregation_key = "Trips"
)
od_jittered |>
arrange(Trips) |>
ggplot() +
geom_sf(aes(colour = Trips), size = 1) +
scale_colour_viridis_c() +
geom_sf(data = drive_net_major, colour = "black") +
theme_void()The package has been onboarded to rOpenSpain public benefit data science community (see ropenspain.github.io)
Source: (Lovelace, Félix, and Carlino 2022)
Source: (Lovelace, Félix, and Carlino 2022)
“In essence ‘open access’ goes beyond ‘open source’ in that users are not only given the option of viewing (potentially indecipherable) source code, but are encouraged to do so, with measures taken in the software itself, and the community that builds it, to make it more user-friendly.””
Source: (Lovelace, Parkin, and Cohen 2020)
Source: screenshot from development version of open source and open access Network Planning Tools for Scotland: https://nptscot.github.io/#/rnet/#9.29/55.9882/-3.4379